Optimum Storage Allocation for Initial Loading of a File

نویسنده

  • J A Van Der Pool
چکیده

When a file is loaded into a direct access storage device using key-to-address transformations, the number and size of storage blocks can be selected. In this study, a selection that minimizes the combined cost of storage space and accesses to the storage device is determined for the case where no record additions or deletions occur after loading. The analysis is based on the assumption that for a given set of keys, a transformation exists that gives a uniform probability distribution over the available addresses. Under this assumption, formulas are derived for the average number of overflow records and for the average number of accesses required to retrieve a record. Given these formulas, the costs are expressed as a function of storage used, number of accesses, cost per unit of storage, and cost per access. Minima are computed for a range of block sizes and operational conditions. The results seem to indicate that current file design practices are abundant with storage space. Finally, the results are condensed in an easy to use approximate formula. Introduction Computerized data are organized as files. A file is a collection of data records; in our case, each record is uniquely identified by a key. In automatic data processing, the data items are frequently stored on a direct access storage device (DASD). A general situation in computer programs using a DASD is the following: a key is shown to the program and the record identified by that key is to be retrieved from the storage device. A fast retrieval from a DASD is possible if the relationship between the key and the address of the record in the device is known. If such a relationship is not established, a search through the file must be made. Two methods for establishing the relationship between key and address are in common usage (for a more extensive treatment see [ 1,2] ) : I . An index is maintained in the same or another storage device. For every key, the corresponding address is listed in the index. With a suitable organization, a search through the index is much faster than a search through the file itself. 2. An algorithm defines a functionf:K + A with the set K of possible keys as domain and the set A of available addresses as codomain. Such a function is often called a key-to-address transformation (KAT). Other names are hashing, hash coding, or scatter storage technique [31. In this paper, the second method of file organization is considered. Many different KAT algorithms have been described in the literature. A systematic comparative evaluation of several algorithms has been made by Lum, et al. [4] using simulation methods. It appears that the results of different transformations vary widely and are highly dependent upon the characteristics of the set of keys. In the ideal situation, a KAT should assign an equal number of records to each of the available addresses. Often this goal is pursued, but clearly not reached, by giving the KAT the property of “randomizing.” For an analysis of the rationale of this approach see Lum, et al. [41. It is assumed here that for a given set of keys we are able to find a perfect random transformation. By this we mean that the outcome of the assignment of an address by this KAT is a random variable with equal probability for each of the values it can assume (Le., having a uniform probability distribution). Under this assumption, formulas are derived for the number of overflow records and the average number of accesses per record. Some of the results of [41 are then compared with the numbers computed from the formulas. A location in the direct access storage device is designated by an address of the set A , and it may provide room for one or more records. In the latter case, several records are combined in one storage block. We use the name “bucket” for such a block. The bucket size s is the maximum number of records that can be contained in a bucket. The collection of b buckets provided in the 579 NOVEMBER 1972 ALLOCATING INITIAL FILE STORAGE DASD for storing the file is called the primary storage area. Because of the random nature of the KAT, more than s record keys may be mapped into the same address. There are different methods for storing the records that exceed the capacity of the bucket. Here, we consider the method of using for all buckets a common separate overflow area in the same or another DASD. The overflow records belonging to a certain bucket are organized in a chain stored in the overflow area. If a record is stored in the primary storage area, a single access to the DASD is sufficient. If the record is in the overflow area, one or more additional accesses are required. In the latter case, the record is found by following pointers from the bucket to the first record, and then from record to record through the overflow area. When this method is implemented for a specific application, the implementer can choose both b and s. If he takes the total capacity bs large relative to the number of records, few additional accesses are required, but the storage cost is high. On the other hand, a low capacity results in high cost for accesses to the overflow storage device. In this paper we formulate a cost function for retrieving records. Minima are computed for different values of the bucket size and different operational considerations. The file is assumed to be static; that is, no deletions or additions of records occur. Use of the Poisson Distribution We consider n records with distinct keys mapped by means of key-to-address transformations into a primary storage area divided into b buckets with a capacity of s records each. If more than s records are assigned to a bucket, the excess records are stored in a separate overflow area common to all buckets. For each bucket with overflow records, there is in the overflow area a chain of overflow records with address pointers. Assuming that we have an equal probability of assigning a record to any of the available buckets, the number of records r assigned to a bucket will have a binominal probability distribution with parameters l lb and n : b(r ;n , i ) = ( y ) (i)r ( 1 i)n-r We are interested in cases where both n and b are large, and where the average number of records assigned to a bucket nlb = m is nearly equal to the bucket size s. The binominal distribution is then very close to the Poisson distribution with parameter m (see, e.g., [ 5 ] , pp. 142 ff.) : Here, P ( r ) is the probability that r records are assigned to a bucket. We further introduce Q ( r ) = P ( i ) . The formulas for overflow and accesses will be expressed in these quantities. Since tables are available [6], the formulas can be used for hand as well as for machine computation. Overflow The frequency function 0 ( i ) of the number i of records assigned to a bucket in excess of its capacity s is: O ( i ) = P ( s + i ) for i 2 1 . The mean of this distribution is: m m m T ( m , s ) = C l O ( i ) = C i P ( s + i ) = x ( r s ) P ( r ) , ( I )

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimum Storage Allocation for a File in Steady State

A file of fixed-length records in auxiliary storage using a key-to-address transformation to assign records to addresses is considered. The file is assumed to be in steady state, that is that the rates of additions to and of deletions from the file are equal. The loading factors that minimize file maintenance costs in terms of storage space and additional accesses are computed for different buc...

متن کامل

Optimum Design of Scallop Domes for Dynamic Time History Loading by Harmony Search-Firefly Algorithm

This paper presents an efficient meta-heuristic algorithm for optimization of double-layer scallop domes subjected to earthquake loading. The optimization is performed by a combination of harmony search (HS) and firefly algorithm (FA). This new algorithm is called harmony search firefly algorithm (HSFA). The optimization task is achieved by taking into account geometrical and material nonlinear...

متن کامل

Yard Crane Pools and Optimum Layouts for Storage Yards of Container Terminals

As more and more container terminals open up all over the world, competition for business is becoming very intense for container terminal operators. They are finding out that even to keep their existing Sea Line customers, they have to make them happy by offering higher quality service. The quality of service they can provide depends on their operating policies and the design of the terminal la...

متن کامل

Virtual Allocation: A Scheme for Flexible Storage Allocation

Traditional file systems allocate and tie up entire storage space at the time the file system is created. This creates a situation where one file system could be running out of space, while another file system has ample unused storage space. In such environment, storage management flexibility is seriously hampered. This paper presents virtual allocation, a scheme for flexible storage allocation...

متن کامل

Simultaneous Allocation Of Reliability & Redundancy Using Minimum Total Cost Of Ownership Approach

This paper addresses the mixed integer reliability redundancy allocation problems to determine simultaneous allocation of optimal reliability and redundancy level of components based on three objective goals. System engineering principles suggest that the best design is the design that maximizes the system operational effectiveness and at the same time minimizes the total cost of ownership (TCO...

متن کامل

Flexible allocation and space management in storage systems

Flexible Allocation and Space Management in Storage Systems. (May 2007) Sukwoo Kang, B.S., Seoul National University; M.S., Seoul National University Chair of Advisory Committee: Dr. A. L. Narasimha Reddy In this dissertation, we examine some of the challenges faced by the emerging networked storage systems. We focus on two main issues. Current file systems allocate storage statically at the ti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002